feat(perf): enhance performance by improving disk caching and applying other optimizations #556

mikeleppane · 2026-01-14T13:00:59Z

Summary

This PR implements a comprehensive performance improvement plan that dramatically reduces Language Server startup time and improves edit responsiveness.

Reuse the executor for library loading instead of spawning new processes
Cache resource LibraryDoc objects to .robotcode_cache/*/resource/
Only invalidate namespaces that actually import the changed file
Allow up to 4 concurrent library loads (max_workers=min(cpu_count, 4))
O(1) lookup for documents importing a given source via reverse dependency map
get_library_users() and get_variables_users() for instant change propagation
Persist namespace initialization state to disk
Skip full import resolution on warm starts

Testing Plan

28 unit tests for PickleDataCache atomic writes
20 unit tests for NamespaceMetaData and cached entries
15 unit tests for ResourceMetaData cache keys
11 integration tests for namespace caching behavior

Architecture

Architecture Overview

flowchart TB
    subgraph cache["Disk Cache (.robotcode_cache/)"]
        lib["libdoc/<br/>Library docs"]
        res["resource/<br/>Resource docs"]
        ns["namespace/<br/>Namespace state"]
    end

    subgraph perf["Performance Optimizations"]
        p1["Shared Process Pool"]
        p2["Resource Caching"]
        p3["Targeted Invalidation"]
        p4["Parallel Library Loading"]
        p5["O(1) Dependency Lookups"]
        p7["Namespace Caching"]
        p9["Atomic Writes"]
    end

    subgraph maps["Reverse Dependency Maps"]
        imp["_importers<br/>source → documents"]
        lu["_library_users<br/>lib → documents"]
        vu["_variables_users<br/>var → documents"]
    end

    p1 --> lib
    p2 --> res
    p7 --> ns
    p9 --> ns
    p3 --> imp
    p5 --> lu
    p5 --> vu

Cold vs Warm Start Flow

flowchart TB
    subgraph startup["IDE Startup"]
        open["User Opens VS Code"]
    end

    subgraph cold["Cold Start (No Cache) ~2-4 min"]
        c1["Parse .robot files"]
        c2["Resolve imports in parallel"]
        c3["Load libraries<br/>(shared executor)"]
        c4["Build namespaces"]
        c5["Save to cache<br/>(atomic write)"]
        c1 --> c2 --> c3 --> c4 --> c5
    end

    subgraph warm["Warm Start (Cache Hit) ~10-20 sec"]
        w1["Check .cache.pkl exists"]
        w2{"Validate:<br/>mtime + size?"}
        w3["Load (meta, spec)"]
        w4["Check environment identity"]
        w5["Restore namespace"]
        w1 --> w2
        w2 -->|"Match"| w3 --> w4 --> w5
        w2 -->|"Changed"| miss["Rebuild"]
    end

    subgraph runtime["Runtime Editing"]
        r1["File changed"]
        r2["O(1) lookup affected docs"]
        r3["Targeted invalidation"]
        r4["Rebuild only affected"]
        r1 --> r2 --> r3 --> r4
    end

    open --> cold
    open --> warm
    
    style cold fill:#ffcccc,stroke:#cc0000
    style warm fill:#ccffcc,stroke:#00cc00
    style runtime fill:#cce5ff,stroke:#0066cc

Cache Validation Chain

sequenceDiagram
    participant LS as Language Server
    participant Cache as Disk Cache
    participant FS as File System

    Note over LS,FS: Warm Start Validation
    
    LS->>Cache: Load .cache.pkl
    Cache-->>LS: (meta, spec) tuple
    
    LS->>FS: stat(source_file)
    FS-->>LS: mtime, size
    
    alt mtime matches
        alt size matches
            Note over LS: Skip content hash!<br/>(Fast path)
            LS->>LS: Check python_executable
            LS->>LS: Check sys_path_hash
            alt Environment matches
                LS->>LS: Restore from cache ✓
            else Environment changed
                LS->>LS: Rebuild namespace
            end
        else size differs
            LS->>FS: Read first+last 64KB
            FS-->>LS: content chunks
            LS->>LS: Compute tiered hash
            alt Hash matches
                LS->>LS: Restore from cache ✓
            else Hash differs
                LS->>LS: Rebuild namespace
            end
        end
    else mtime differs
        LS->>LS: Rebuild namespace
    end

Add persistent namespace caching to significantly improve warm start performance. Cached namespaces are loaded from disk instead of being rebuilt from scratch. Key changes: - Add NamespaceMetaData and NamespaceCacheData frozen dataclasses for cache serialization with validation fields (mtime, content_hash, python_executable, sys_path_hash) - Add atomic cache writes using temp file + rename pattern - Add reverse dependency tracking for efficient library/variable change propagation (get_library_users, get_variables_users) - Skip content hash computation when mtime AND size match - Add ResourceMetaData for resource caching Tests: - Unit tests for PickleDataCache atomic writes (28 tests) - Unit tests for NamespaceMetaData and cached entries (20 tests) - Unit tests for ResourceMetaData cache keys (15 tests) - Integration tests for namespace caching behavior (11 tests)

Optimize workspace-wide reference operations from O(D) to O(k) where D = total documents and k = documents actually referencing the target. Changes: - Add reverse index data structures in DocumentsCacheHelper to track which documents reference each keyword/variable - Use stable (source, name) tuple keys resilient to cache invalidation - Implement diff-based updates to handle removed references after edits - Add get_keyword_ref_users() and get_variable_ref_users() for O(1) lookup - Update Find References to use reverse index with workspace scan fallback - Update unused keyword/variable detection to use reverse index

mikeleppane · 2026-01-15T15:57:24Z

I've included the second performance optimization, which can be found from this commit: 59e0314

Problem

Previously, workspace-wide reference operations (Find References, unused keyword/variable detection) required scanning all documents in the workspace for each lookup. This resulted in O(D) per-lookup complexity, where D is the total number of documents.

For unused-detection checking of K keywords, this required O(K × D) operations, causing noticeable delays in large workspaces.

Solution

Added a reverse index that maps each keyword/variable to the documents that reference it. This reduces lookup complexity from O(D) to O(k), where k is the number of documents that actually use the target (typically much smaller than D).

Architecture

Before: O(D) Workspace Scan

flowchart LR
    subgraph "Find References (Before)"
        A[Request: Find refs to 'My Keyword'] --> B[Scan ALL documents]
        B --> C[doc_001.robot]
        B --> D[doc_002.robot]
        B --> E[doc_003.robot]
        B --> F[...]
        B --> G[doc_999.robot]
        C --> H[Check namespace]
        D --> H
        E --> H
        F --> H
        G --> H
        H --> I[Return matches]
    end

After: O(k) Reverse Index Lookup

flowchart LR
    subgraph "Find References (After)"
        A[Request: Find refs to 'My Keyword'] --> B[Lookup reverse index]
        B --> C["index[('source', 'My Keyword')]"]
        C --> D[Return: doc_003, doc_047, doc_891]
        D --> E[Scan only 3 documents]
        E --> F[Return matches]
    end

Data Structures

flowchart TB
    subgraph "Reverse Index Structure"
        direction TB
        A["_keyword_ref_users<br/>dict[tuple[str, str], WeakSet[TextDocument]]"]
        B["Key: (source, name)<br/>e.g. ('common.resource', 'My Keyword')"]
        C["Value: WeakSet of documents<br/>that reference this keyword"]
        A --> B
        A --> C
    end
    
    subgraph "Forward Index (for diff-based updates)"
        direction TB
        D["_doc_keyword_refs<br/>WeakKeyDictionary[TextDocument, set]"]
        E["Key: TextDocument"]
        F["Value: set of (source, name) tuples<br/>this document references"]
        D --> E
        D --> F
    end

Extend namespace disk caching to include keyword references, variable references, and local variable assignments. This allows the analysis phase to be completely skipped when loading from a valid cache. Key changes: - Add KeywordRefKey and VariableRefKey stable keys for serialization - Serialize/restore keyword_references, variable_references, and local_variable_assignments in namespace cache - Implement 10% staleness threshold: if >10% of cached references cannot be resolved, fall back to fresh analysis - Track references when loading fully-analyzed namespaces from cache

- Format 4 source files to pass ruff style checks - Fix filepath_base tests to use Path for cross-platform compatibility instead of hardcoded Unix byte strings for hash computation

mikeleppane mentioned this pull request Jan 14, 2026

[ENHANCEMENT] RobotCode Language Server Performance Improvements #554

Open

mikeleppane force-pushed the feat(perf)/performance-improvements branch from af2170a to 221fb3d Compare January 14, 2026 18:01

mikeleppane force-pushed the feat(perf)/performance-improvements branch from 221fb3d to ec8f008 Compare January 14, 2026 18:35

mikeleppane added 2 commits January 16, 2026 08:59

fix: apply ruff formatting and fix Windows test failures

9e1e51c

- Format 4 source files to pass ruff style checks - Fix filepath_base tests to use Path for cross-platform compatibility instead of hardcoded Unix byte strings for hash computation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(perf): enhance performance by improving disk caching and applying other optimizations #556

feat(perf): enhance performance by improving disk caching and applying other optimizations #556

Uh oh!

mikeleppane commented Jan 14, 2026

Uh oh!

mikeleppane commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

feat(perf): enhance performance by improving disk caching and applying other optimizations #556

Are you sure you want to change the base?

feat(perf): enhance performance by improving disk caching and applying other optimizations #556

Uh oh!

Conversation

mikeleppane commented Jan 14, 2026

Summary

Testing Plan

Architecture

Architecture Overview

Cold vs Warm Start Flow

Cache Validation Chain

Uh oh!

mikeleppane commented Jan 15, 2026

Problem

Solution

Architecture

Before: O(D) Workspace Scan

After: O(k) Reverse Index Lookup

Data Structures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant